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ABSTRACT:  Intrinsically  disordered  proteins  are  characterized  by  their  large  manifold  of 
thermally  accessible  conformations  and  their  related  statistical  weights  making  them  an 
interesting  target  of  simulation  studies.  To  assess  the  development  of  a  computational 
framework  for  modeling  this  class  of  proteins,  this  work  examines  temperature-based  replica 
exchange  simulations  to  generate  a  conformational  ensemble  of  a  disordered  28-residue  peptide 
from  the  Ebola  virus  protein  VP35  starting  from  a  prefolded  helix-P-tum-helix  topology 
observed  in  a  viral  assembly.  The  simulation  strategy  tested  is  the  recently  refined 
CHARMM36m  force  field  combined  with  a  generalized  Bom  solvent  model  to  calculate 
probability  density  profiles  and  the  results  are  compared  to  an  equivalent  CHARMM22 
simulation  dataset.  The  assessment  is  further  extended  to  include  coarse-grained  lattice  Monte 
Carlo  simulations  to  determine  the  accuracy  of  a  reductionism  perspective.  The  analysis  finds 
CHARMM36m  to  correctly  shift  the  minimum  in  the  potential  of  mean  force  to  a  lower 
fractional  helicity  as  compared  to  CHARMM22,  however  in  both  simulation  models  the 
conformational  plasticity  along  the  helix-forming  reaction  coordinate  was  limited  by  tree-energy 
barriers.  By  comparison  the  coarse-grained  model  yielded  a  potential  of  mean  force  of  lower 
resolution  as  anticipated,  yet  the  model  successfully  showed  multiple  equally  weighted  low- 
energy  states  of  large-scale  conformational  heterogeneity  not  observed  in  the  all-atom  models. 
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1.  INTRODUCTION 

Intrinsically  disordered  proteins  (IDPs)  that  encompass  “Dark  Matter”  proteomes  play  a 
fundamental  role  in  the  regulation  and  function  of  protein  association  networks.1"4  Their 
hallmark  is  large-scale  conformational  heterogeneity  in  free  solution  while  finding  a  folded 
topology  upon  forming  a  multimeric  assembly.  An  illustrative  example  is  a  28-residue  peptide 
region  extracted  from  the  Ebola  virus  protein  VP35.5  The  X-ray  crystallographic  structure  of  the 
peptide  (designated  as  NPBP)  bound  to  the  Ebola  virus  protein  NP  shows  a  helix-p-tum-helix 
topology.  In  free  solution  NPBP  transitions  to  a  disordered  ensemble  as  observed  from  circular 
dichroism  (CD)  spectroscopy.5  What  makes  the  disorder-order  transition  of  NPBP  of  larger 
interest  is  when  added  to  a  solution  of  50%  trifluoroethanol  (TFE)  the  CD  spectrum  shows  the 
peptide  transitions  from  an  unstructured  ensemble  to  structures  containing  helical  folds.  This 
hidden  propensity  of  NPBP  and  conceivably  that  of  many  other  IDPs  makes  atomistic  simulation 
studies  challenging  to  capture  a  heterogeneous  conformational  ensemble  without  being  overly 
biased  by  the  susceptibility  to  fold.  The  challenge  is  amplified  by  the  inherent  deficiencies  of 
all-atom  force  fields  and  solvent  models  that  tend  to  over  stabilize  fold  propensities. 

The  topic  of  this  brief  study  is  to  test  the  recently  reported  CHARMM36m  force  field  in  a 
temperature-based  replica  exchange6  (T-ReX)  simulation  of  NPBP.  The  force  field  is  a 
refinement  of  an  earlier  version  to  improve  the  accuracy  in  polypeptide  backbone  conformational 
ensembles  for  IDPs.7  Although  the  development  of  CHARMM36m  is  primarily  intended  for 
explicit  solvent  simulations,  the  work  here  applies  the  force  field  with  a  generalized  Bom  (GB) 
solvent  approximation  represented  by  the  GBMV2  model.8  Given  the  continued  refinement  of 
force  fields  and  optimization  studies  of  GB  models,7’9  it  is  of  general  interest  to  determine  if  the 
combined  CHARMM36m/GBMV2  simulation  strategy  provides  a  framework  for  modeling 
IDPs.10"12  While  GBMV2  is  a  computationally  efficient  model  as  compared  to  explicit  solvent 
simulations  and  has  shown  to  accurately  reconstitute  the  thermal  stability  of  small  a-helical  and 
P-folded  proteins,13"16  questions  remain  if  implicit  solvent  models  can  correctly  shift  the  density 
of  states  of  a  prefolded  IDP  on  an  energy  landscape  with  multiple  kinetic  traps  to  an  ensemble  of 
disordered  states  favored  by  configurational  entropy.  To  help  with  the  assessment,  the 
CHARMM36m/GBMV2  simulation  results  are  compared  to  a  reassessment  of 
CHARMM22/GBMV2  simulation  trajectories  taken  from  an  earlier  study.17  The  comparison 
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centers  on  computing  potentials  of  mean  force  (PMFs)  using  the  parallel  tempering  weighted 
histogram  analysis  method  (PTWHAM)  and  the  multistate  Bennett  acceptance  ratio  (MBAR) 
method.19 

A  further  comparison  is  provided  of  the  all-atom  protein  simulation  models  with  a 
coarse-grained  (CG)  method.  The  method  is  based  on  low-resolution  lattice  Monte  Carlo 
simulations  and  explores  a  reductionism  strategy  to  modeling  IDPs.  The  modeling  approach  is  a 
revisit  of  earlier  work  by  Skolnick  and  coworkers  of  applying  the  side-chain-only  (SICHO) 
model  ’  to  NPBP.  Accurate  reconstruction  of  all-atom  protein  representations  from  CG 
conformations  is  applied  using  the  technique  developed  by  Feig  and  coworkers.22  Previous 
successful  application  of  the  SICHO  model  with  T-ReX  and  the  rebuilding  of  all- atom  structures 
is  illustrated  by  the  multiscale  refinement  of  protein  loops.  ’  Here,  PMFs  are  calculated  from 
the  CG  model  and  compared  to  the  CHARMM36m  and  CHARMM22  generated  conformations. 

2.  METHODS 

CHARMM-based  Simulations.  For  conformational  sampling  of  the  28-residue  NPBP  peptide, 
the  self-guided  Langevin  dynamics  (SGLD)  method  developed  by  Wu  and  Brooks25'26  was 
combined  with  T-ReX  using  a  strategy  first  reported  in  modeling  the  Trp-cage  mini-protein.14 
The  simulation  methodology  and  parameter  set  applied  here  are  similar  to  that  given  in  an  earlier 
study  of  the  NPBP  peptide  using  the  CHARMM22  force  field  with  the  GBMV2  solvent  model.17 
Here,  a  summary  of  the  approach  is  noted  of  applying  CHARMM36m. 

The  CHARMM  simulation  program27  (version  c41bl)  was  applied  using  the  utilities  and 
programming  libraries  of  the  Multiscale  Modeling  Tools  for  Structural  Biology  (MMTSB).28  An 
integration  time  step  of  2  fs  was  used  and  parameters  for  SGLD  consisted  of  the  friction  constant 
set  to  1  ps'1  for  all  heavy  atoms,  the  guiding  factor  X  to  a  value  of  1,  and  the  time  average  of  the 
momentum  was  set  to  1  ps.  Non-bonded  interaction  cutoff  parameters  for  electrostatics  and  non¬ 
polar  terms  were  set  at  a  radius  of  22  A  with  a  2- A  potential  switching  function.  Covalent  bonds 
between  the  heavy  atoms  and  hydrogen  atoms  were  constrained  by  the  SHAKE  algorithm.29  The 
GBMV2  parameters  were  selected  to  smooth  the  molecular  volume  by  setting  gbmvabeta  =  -12 
and  gbmvap3  =  0.65.  The  hydrophobic  cavitation  term  was  modeled  by  applying  a 
phenomenological  surface  tension  coefficient  set  to  a  value  of  0.015  kcal/mol/A2. 
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Simulations  were  carried  out  using  24  replica  clients  and  the  frequency  of  exchanges  was 
set  to  every  1  ps  of  simulation.  Temperatures  were  geometrically  spaced  between  Tmm  =  300  K 
and  Tmax  =  475  K.  The  NPBP  peptide  was  modeled  for  200  ns  of  simulation  time  per  client, 
generating  an  ensemble  of  4.8  ps.  Culled  structures  consisted  of  200000  per  temperature  and 
were  used  in  the  analysis  of  computing  the  secondary  structure  by  the  DSSP  algorithm30  and  the 

radius  of  gyration  ( Rg ).  PMFs  (denoted  as  the  measure  Wt  at  temperature  T)  were  computed 

using  order  parameters  of  fractional  helicity  (fa),  Rg  and  Z-score  of  the  potential  energies  as  input 
to  PTWHAM  and  MBAR  calculations.  It  should  be  noted  that  the  PMFs  generated  by  the  SGLD 
method  are  prone  to  small  deviations  from  a  canonical  description  due  to  the  ad  hoc  force  term 
added  to  achieve  enhanced  sampling  (see,  e.g.,  Refs.  14  and  26).  While  an  algorithmic  scheme  to 
reweight  the  biases  of  local  averages  of  forces  and  momenta  has  been  reported,26  the  application 
is  cumbersome  and  exceeds  the  purpose  of  this  work  where  the  deviation  is  thought  to  be  small 
for  a  peptide.17 

Further  analysis  of  the  generated  conformations  was  conducted  using  MMTSB  to 
evaluate  the  population  density  of  states  by  clustering  methods  that  included  hierarchical 
clustering  (conformational  and  fa  values)  and  /c-means  clustering  (conformational).  Additional 
examination  was  conducted  of  the  root-mean-square  deviation  (RMSD)  in  selected  backbone 
angles  ®  and  T  from  the  initial  folded  peptide  structure.  These  values  were  computed  across  the 
ensemble  for  two  helical  segments  and  provided  input  into  a  PTWHAM  calculation. 

Lattice  Simulations.  Chain  conformations  of  NPBP  were  generated  based  on  Monte 
Carlo  sampling  of  a  cubic  lattice  using  the  MONSSTER  program  developed  by  Skonick  and 
coworkers.  ’  The  SICHO  model  was  applied  where  each  amino  acid  is  represented  by  a 
single  virtual  particle  located  at  the  side-chain  center  of  mass  and  projected  onto  a  cubic  lattice. 
The  force  field  consists  of  potential  energy  terms  that  account  for  short-  and  long-range 
interactions,  hydrogen-bonding  cooperativity,  and  a  mean  force  potential  that  describes 
hydrophobic  interactions.  Each  force-field  term  is  constructed  of  sequence-independent, 
sequence-dependent,  and  restraint  components.  For  the  purpose  of  the  work  presented  here,  it  is 
noted  that  the  sequence-dependent  terms  were  derived  through  geometric  statistics  of  known 
protein  structures  and  account  for  short-range  interactions  between  nearest  neighbors  along  the 

90 

polypeptide  chain,  as  well  as  long-range,  pairwise,  and  soft-core  repulsive  interactions. 
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The  grid  size  for  the  cubic  lattice  was  set  at  a  value  of  125  lattice  units  in  each  direction 
at  a  resolution  of  1.45-A  grid  spacing.  Two  separate  simulations  were  conducted,  where  the 
parameter  stiff  that  controls  the  scaling  of  a  generic  potential  term  favoring  the  formation  of 
secondary  structure  elements  was  varied.  One  simulation  had  the  parameter  set  to  a  value  of  1 .0 
(where  the  default  is  1.25)  and  the  other  set  to  0.5.  Other  parameters  were  set  to  their  default 
values  given  in  the  MMTSB  description.  Each  of  the  two  simulations  was  started  from  the  PDB 
conformation  of  NPBP  and  the  sequence  file  annotated  with  the  DSSP  secondary  structure. 

The  number  of  lattice  T-ReX  simulation  cycles  at  each  temperature  was  set  to  20  and  the 
number  of  Monte  Carlo  moves  per  cycle  was  50.  Culled  conformations  were  extracted  from  24 
replicas  yielding  a  sampling  size  comparable  to  the  all-atom  simulations.  Replicas  were 
exponentially  spaced  from  a  reduced  temperature  T  of  1.0  to  2.4,  where  T  is  normalized  by  a 
reference  temperature  such  that  ffx  =  kff  represents  the  energy  unit  (where  &b  is  the  Boltzmann 
constant).  The  value  T  =  1  is  set  to  represent  the  distribution  of  conformations  modeled  by  the 
SICHO  force  field  at  approximately  300  K.  All-atom  structures  were  reconstructed  from  the 
lattice  simulations  by  using  MMTSB.  As  with  the  CHARMM-based  simulations,  PMFs  were 
calculated  and  clustering  of  conformations  was  conducted. 

3.  RESULTS  and  DISCUSSION 

The  initial  conformation  of  the  NPBP  peptide  (numbered  from  residues  20  to  47)  bound  to  Ebola 
virus  protein  NP  is  a  topology  of  a  helix-p-tum-helix  fold  and  shows  a  fractional  helicity  of/n  = 
0.43  from  DSSP.  The  two  helices  are  partitioned  as  Trp28  to  Thr35  and  Val40  to  Ile43.  The 
initial  fold  compactness  is  given  by  the  dimension  Rg=  10  A  and  appears  more  collapsed  than  an 
estimate  for  a  comparable  unfolded  28-residue  peptide  showing  Rg  ~  13  A.31 

Potentials  of  Mean  Force.  Illustrated  in  Figure  1  are  the  PMFs  at  300  K  evaluated  by 
using  PTWHAM  for  the  analysis  of  the  CHARMM-based  simulations  and  the  CG  model. 
Comparison  between  CHARMM36m  and  CHARMM22  shows  a  clear  distinction  in  population 
density  underlying  the  conformational  ensemble.  For  CHARMM36m,  the  minimum  in 

Wi(fH,Rg)  is  observed  at  ff  =  0.26  with  a  Rg  =  7.7  A,  while  CHARMM22  shows  the  minimum  at 
/h  =  0.53  and  Rs  =  9.8  A.  In  addition,  CHARMM22  shows  a  connecting  local  minimum  at  Wfffw 
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=  0.37,/?g  =  7.9  A)  with  a  free-energy  difference  of  only  0.05£bT  from  the  global  minimum.  To 
help  place  the  minima  in  perspective,  PTWHAM  of  conformations  taken  from  an 
explicit/implicit  solvent  hybrid  T-ReX  molecular  dynamics  simulation  study17  of  NPBP  with 
CHARMM22/TIP3P  combined  with  GBMV2  for  computing  the  Metropolis  exchanges  produced 

a  minimum  located  at  Wjifa  =  0.26, Rg  =  8.8  A).  While  CHARMM36m  with  GBMV2  is  nearly 

equal  to  that  observed  of  the  explicit/implicit  solvent  simulation,  the  issue  is  conformational 
heterogeneity  of  the  generated  ensemble. 

Overall  CHARMM36m  corrects  the  weight  of  helix  propensity  compared  to 
CHARMM22,  yet  the  population  density  of  both  models  remains  in  disagreement  with 
observations  from  CD  experiments  without  the  addition  of  TFE.  With  TFE,  the  experiments  and 
simulation  models  are  in  better  agreement  in  showing  preferred  helical  states,  however  sampling 
from  CHARMM36m  is  more  confined  than  CHARMM22  and  a  possible  kinetic  trap  is  revealed. 
The  latter  becomes  evident  in  comparing  the  energy  Z-score  landscapes,  where  CHARMM22 
simulation  shows  a  manifold  of  shuttling  conformations  among  the  major  basins  through  a 
pathway  of  favorable  exchanges  among  the  probability  distribution.  The  distinction  between  the 
two  force  fields  is  also  highlighted  from  the  difference  in  transition  from  the  global  minimum  in 

each  Wiifw,Rg)  to  an  unstructured  conformation  (fa  =  0,  computed  at  an  equivalent  Rg  as  the 

minimum).  The  analysis  shows  for  the  two  force  fields  that  CHARMM36m  yields  a  greater  free- 
energy  difference  of  ~2  k\(T.  An  advantage  for  CHARMM36m  is  that  transitions  from  the 
minimum  to  Wjif'w  =  0 ,Rg  >  10  A)  shows  less  frustration  than  CHARMM22. 

To  further  characterize  the  landscape  topology  and  explore  differences  in  applying  the 
CHARMM  force  fields  with  GBMV2,  Wj{fu,Rg)  was  recalculated  by  using  the  MBAR  method. 

The  analysis  shows  for  CHARMM36m  an  equivalent  minimum  Wj(fa  =  0.26,/C  =  8.0  A)  as 

calculated  from  PTWHAM  and  a  slight  reweighting  of  the  population  density  for  CHARMM22 
to  yield  Wjifa  =  0.46, Rg  =  8.1  A).  One  notable  difference  is  observed  for  CHARMM36m  in  that 

MBAR  yields  a  deeper  kinetic  trap  of  ~3  k^T  in  stabilizing  /H  =  0.26  when  transitioning  to 
unstructured  conformations. 
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It  is  important  to  note  that  the  GBMV2  model  is  an  accurate  approximation  of  the 
Poisson  implicit  solvent  model  and  applying  the  standard  protocol  of  fitted  parameters  yields 
the  good  agreement  with  explicit  solvent  simulations  of  calculating  the  charging  free  energy  of 
protein  conformations.33  Deviation  to  the  protocol  by  modification  of  Bom  radii  on  charged 
residues  and  reducing  the  surface  tension  of  modeling  the  hydrophobic  collapse  may  lead  to 
lessening  the  kinetic  trap  observed  in  Fig.  1A.  Additional  improvements  may  be  achieved  by 
incorporating  volume  and  surface  area  nonpolar  solvation  free  energy  terms34  combined  with 
adaptive  parallel  tempering  methods. 15,17,35  Alternately  a  unified  strategy  of  re-optimization  of 
the  GBMV2  solvent  force  field  for  CHARMM36  has  been  recently  proposed  by  Lee  and  Chen.9 
While  the  approach  appears  promising,  further  testing  is  needed  for  evaluating  the  adjusted 
GBMV2  parameters  for  various  applications,  including  thermal  stability  of  proteins  with 
different  fold  propensities. 

The  CG  model  simulation  results  of  scaling  the  SICHO  potential  term  favoring  the 
formation  of  secondary  structure  elements  by  the  stiff  parameter  of  1  (see  Methods)  is  shown  in 
Fig.  1C  of  using  reconstructed  all-atom  structures  from  the  lattice  generated  conformations. 
PTWHAM  of  the  simulation  data  is  presented  and  similar  results  were  obtained  from  MBAR. 
Analysis  reveals  the  CG  model  produced  significant  conformational  plasticity  among  multiple 
states  with  free-energy  barriers  <  0.01  kcal/mol.  While  is  of  lower  resolution  in 

defining  density  contours,  the  CG  model  exhibits  specificity  in  lattice  energies  as  displayed  in 
the  energy  Z-score  profile.  At  the  same  time  the  model  avoids  kinetic  traps  to  produce  a 
landscape  more  consistent  with  the  notion  of  a  disordered  ensemble  for  NPBP.  Given  the 
observed  conformational  heterogeneity,  a  simple  measure  of  sampled  space  is  the  statistical 
average  of  populations  at  T  =  1  and  is  given  by/H  =  0.42  ±  0.14,  where  the  value  approaches  the 
cluster  of  conformations  described  by  the  CHARMM22  minimum.  Reducing  stiff  to 

0.5  returns  an  average  of  0.1 1  ±  0.12.  In  both  CG  model  strategies,  the  simulations  unfold  and 
refold  NPBP  and  selection  of  parameters  near  the  default  values  offers  a  fair  assessment  of  the 
accuracy  in  modeling  heterogeneity  of  secondary-structure  formation. 

Structural  Analysis.  Displayed  in  Fig.  2  is  the  starting  NPBP  conformation  bound  to  the 
Ebola  virus  protein  NP.  As  noted  above,  NPBP  is  composed  of  two  helical  regions  denoted  as 
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al  of  residues  Trp28  to  Thr35  and  a2  for  residues  Val40  to  Ile43.  Illustrated  in  Fig.  2B  are 
conformations  taken  from  clustering  the  CHARMM36m  generated  ensemble  at  300  K.  A  further 
comparison  is  made  with  structures  from  CHARMM22  simulation17  (Fig.  2C)  and  a  set  of  all¬ 
atom  structures  reconstructed  from  the  CG  generated  conformations  (Fig.  2D). 

Analysis  of  the  extracted  structures  from  CHARMM36m/GBMV2  shows  the 
conformation  located  at  the  sampled  potential  energy  minimum  (Emin)  contains  a  short  helix  (fa  = 

0.26)  in  the  al  region.  This  structure  best  represents  the  minimum  in  Wjifa,Rg)  while  the 

remaining  structures  exhibit  varying  helical  lengths  in  al.  Unlike  CHARMM36m,  the  Emm  of 
CHARMM22/GBMV2  contains  a  helix-tum-helix  fold  of/H  =  0.43  showing  similarity  to  the 
bound  form.  For  the  CG  model,  contrary  to  possible  chain  distortions  and  entanglements  owing 
to  the  lower  resolution,  the  simulation  yielded  well  formed  structures  with  Emm  containing  a 

longer  helix  of  fa  =  0.43.  The  CG  landscape  of  Wjifa,Rg)  shows  a  multiple  transitions  among 
the  structures  where  fa  <  ~  0.5,  while  the  fold  positioned  at  fa  =  0.75  is  of  low  population. 

Figure  2E  reports  the  Ca  RMSD  of  structures  extracted  from  the  simulation  models  at 
300  K  relative  to  the  prefolded  NPBP  conformation.  The  analysis  shows  the  RMSD  values  are 
in  the  range  of  5-7  A  of  which  the  CG  model  reveals  the  greatest  net  value.  More  importantly, 
the  RMSD  profiles  lack  significant  disorder  of  round  trips  from  the  initial  prefolded  state  to  large 
Ca  excursions.  This  observation  has  implications  on  how  the  simulation  models  would 
represent  molecular  recognition  of  NPBP  by  the  protein  NP.  It  suggests  the  kinetics  of 
recognition  is  by  structural  reorganization  via  “induced- fit”  mechanism  of  helical  populations 
rather  than  the  slow  rate-limiting  step  of  capturing  a  completely  unfolded  state. 

Unlike  CHARMM36m,  the  CHARMM22  and  CG  models  show  conformations  with 
helical  folds  in  the  a2  region.  To  further  investigate  the  clustering  of  structures  and  the  limited 
folds  of  a  al-tum-a2  topology  found  by  CHARMM36m,  Fig.  3  illustrates  PTWHAM  assessment 
of  conformational  landscapes  of  RMSD  in  backbone  angles  <J>  and  T'  from  the  prefolded  peptide 
for  3-residue  segments  Ser30-Glu31-Gln32  of  al  and  Val40-Ser41-Asp42  of  a2.  The  analysis 
finds  for  al  (Fig.  3 A)  the  highest  populated  basin  to  be  located  at  small  RMSD  differences  while 
the  landscape  shows  considerable  conformational  populations  spread  out  among  the  profile.  For 
a2,  a  strikingly  different  result  is  obtained  where  the  landscape  shows  “hot”  population  regions 
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of  deviations  largely  removed  from  the  initial  structure,  indicating  weak  stability  in  the  short 
helix.  Interestingly,  a  further  analysis  shows  that  only  CHARMM36m/GBMV2  samples  a 
transient  p-hairpin  in  the  C-terminal  region  combined  with  al.  This  arrangement  found  by 
CHARMM36m  is  surprisingly  consistent  with  unbiased  predictions  of  secondary  structure  by  a 
consensus  approach.36 

Helix  Stiffness  and  Compactness  Propensity.  Illustrated  in  Figure  4  are  plots  of  helix 
formation  and  fold  compactness  as  a  function  of  sampling  temperatures.  Shown  are  statistical 
averages  over  datasets  for  CHARMM36m,  CHARMM22  and  the  CG  model.  For  the  latter, 
effective  temperatures  from  the  reduced  representation  of  the  lattice  simulation  model  were 
approximately  scaled  to  those  of  the  all-atom  simulations.  Also  shown  in  the  plots  is  an 
assessment  of  the  statistical  averages  for  the  CHARMM36m  simulation  by  overlaying  values  of 

/h  and  Rg  where  Wiifu,Rg)  =  0  along  the  temperature  profile  computed  from  MBAR. 

As  anticipated,  the  analysis  shows  the  CG  model  to  exhibit  the  weakest  cooperativity  and 
thermal  stability  in  helix  formation  among  the  simulation  models  (Fig.  4A).  While 
conformational  excursions  produced  by  the  CG  model  sampled  significant  helical  populations, 
the  statistical  average  is  located  among  multiple  low  free-energy  states  where  barriers  in 
Wjifn,Rg)  are  ~  0.01  kBT.  By  comparison,  the  results  show  CHARMM36m  to  retain  helical 

states  observed  in  the  Wj{fn,Rg )  minima  over  a  50-K  temperature  span,  before  the  secondary 

structure  unravels.  Of  the  two  all-atom  force  fields  and  their  thermal  profiles,  CHARMM36m 
provides  a  better  model  of  the  experimental  CD  observations.5 

One  of  the  concerns  of  applying  implicit  solvent  descriptions  to  modeling  large-scale 
conformational  heterogeneity  is  that  the  sampled  structures  will  be  overly  collapsed  (see,  e.g., 
comparison  of  solvent  models  in  Ref.  17).  Figure  4B  shows  all  three  simulation  models  are 
plagued  by  extended  compactness  of  the  generated  conformations.  As  a  noted  benchmark,  the 
prefolded  peptide  conformation  shows  a  Rg  ~  10  A  in  the  multimeric  complex.  A  further 
observation  in  Fig.  4B  reveals  significant  deficiency  in  the  CG  simulation  model.  While  the 
model  samples  Rg>  15  A  as  shown  in  the  Wjif^Rg),  the  SICHO  force  field  strongly  favors 
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compacted  topologies  during  replica  exchanges,  making  this  simulation  approach  less  attractive 
for  modeling  multistate  ensembles  of  thermal  unfolding  transitions. 

4.  CONCLUSIONS 

This  study  investigated  the  application  of  the  CHARMM36m  force  field  with  the  GBMV2 
implicit  solvent  model  in  a  replica  exchange  simulation  of  calculating  the  conformational 
ensemble  of  a  28-residue  IDP  from  the  Ebola  virus  protein  VP35.  Comparisons  were  made  with 
data  from  an  equivalent  CHARMM22/GBMV2  simulation  study  and  a  coarse-grained  model  of 
applying  a  lattice  Monte  Carlo  simulation.  The  central  issue  of  the  study  is  the  applicability  of 
potential  energy  functions  applied  in  parallel  tempering  algorithms  as  a  computational  approach 
for  modeling  large-scale  conformational  heterogeneity.  The  measure  of  success  was  the  accuracy 
to  replicate  a  disordered  conformational  ensemble  of  the  peptide  as  measured  from  CD 
experiments.  Starting  from  a  helix-tum-helix  topology,  the  results  revealed  that  CHARMM36m 
combined  with  GBMV2  produced  a  potential  of  mean  force  of  lower  fractional  helicity  than 
CHARMM22,  yet  neither  simulation  model  captured  significant  conformational  plasticity  along 
the  helix-forming  reaction  coordinate  between  unstructured  and  folded  conformations. 
Moreover,  the  models  displayed  a  helix  propensity  with  an  extended  thermal  stability  over  the 
ensemble  and  the  conformations  were  overly  collapsed  in  the  dimension  of  radius  of  gyration. 
Overall  the  study  demonstrated  that  the  accuracy  of  the  GBMV2  model  in  its  standard  protocol 
with  the  all-atom  force  field  CHARMM36m  is  limited  in  the  modeling  large-scale 
conformational  heterogeneity  of  IDPs.  The  likely  best  scenario  of  applying  the  GBMV2  model 
with  CHARMM36m  is  the  explicit/implicit  solvent  hybrid  replica  exchange  method  where 
peptide  conformations  are  generated  on  an  explicit  solvent  landscape.  By  comparison,  the 
coarse-grained  model  yielded  an  ensemble  of  thermally  accessible  states  showing  conformational 
disorder  in  the  potential  of  mean  force.  Like  the  all-atom  models,  the  lattice-generated 
conformations  were  collapsed  in  fold  space  on  the  manifold  of  highly  populated  states. 
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■  Figure  Legends 

Figure  1.  Conformational  and  energy  landscapes  computed  from  the  three  simulation  models 
where  the  order  parameters  are  radius  of  gyration,  fractional  helicity  and  Z-score  of  potential 
energies.  Plots  display  data  extracted  at  T=  300  K  for  the  all-atom  simulations  and  T=  1  for  the 
CG  lattice  Monte  Carlo  model.  Results  are:  (A)  CHARMM36m/GBMV2  simulation  results;  (B) 
CHARMM22/GBMV2  simulation  results;  and  (C)  CG  simulation.  The  free  energy  scale  is 
displayed  where  the  color  blue  represents  minima  in  the  PMFs  and  is  followed  in  order  by  colors 
green,  yellow  and  red,  where  the  latter  denotes  high  energy  states  of  low  population. 

Figure  2.  Analysis  of  structures  extracted  from  the  simulations.  (A)  The  initial  prefolded 
conformation  observed  in  the  viral  assembly  where  the  small  peptide  represents  NPBP  and  the 
molecular  surface  represents  Ebola  virus  NP.  Further  conformations  are  illustrated  taken  at  T  = 
300  K  from  the  CHARMM36m/GBMV2  simulation.  (B)  Conformations  extracted  from  the 
CHARMM22/GBMV2  simulation.17  (C)  Conformations  extracted  from  the  CG  model 
simulation  results  at  T=  1 .  (D)  Plots  of  conformation  index  versus  Ca-RMSD  (units  of  A)  at  the 
lowest  temperature  from  the  prefolded  conformation  where  the  blue  colored  line  and  gray  data 
set  represent  the  CHARMM36m/GBMV2  simulation  results,  red  colored  line  represents  the 
CHARMM22/GBMV2  simulation  and  the  black  colored  line  represents  the  CG  model. 

Figure  3.  Landscape  of  RMSD  in  ®  and  *P  (units  of  angle)  in  modeled  conformations  from 
CHARMM36m/GBMV2  simulation  compared  to  the  prefolded  NPBP  for  (A)  residues  Ser30- 
Glu31-Gln32  of  al  and  (B)  residues  Val40-Ser41-Asp42  of  a2.  Colors  applied  in  the  PMFs  and 
their  scales  are  noted  in  Figure  1. 

Figure  4.  Helix  stiffness  and  compactness  propensity  of  generated  conformations  by  the 
simulation  models  along  the  temperature  profile.  Line  with  red  color  circles  represents  a 
statistical  average  of  data  from  the  CHARMM36m/GBMV2  simulation,  blue  colored  line  and 

symbols  represent  values  of/n  and  Rg  where  Wi(fn,Rg)  =  0  for  CHARMM36m/GBMV2  dataset, 

long-dashed  line  represents  CHARMM22/GBMV2,  and  short-dashed  line  represents  the  CG 
model.  (A)/h  versus  T;  and  (B)  Rg  versus  T.  Standard  deviation  for  CHARMM36m/GBMV2 
simulation  generating /H  at  T  =  300  K  is  ±  0.10  and  ±  0.02  at  T  =  475  K;  for  Rg  the  standard 
deviation  is  ±  0.38  A  at  T  =  300  K  and  for  T  =  475  K,  deviation  is  ±  2.64  A.  Error  in 
determining  the  minima  in  Wiif^Rg)  along  the  temperature  coordinate  is  —  0.1  kcal/mol. 
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